Overview

Dataset Statistics

Number of Variables 13
Number of Rows 768
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 153.9 KB
Average Row Size in Memory 205.2 B
Variable Types
  • Numerical: 9
  • Categorical: 4

Dataset Insights

Pregnancies is skewed Skewed
SkinThickness is skewed Skewed
Insulin is skewed Skewed
Age is skewed Skewed

Variables


Pregnancies

numerical

Approximate Distinct Count 16
Approximate Unique (%) 2.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12288
Mean 4.2786
Minimum 1
Maximum 17
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Pregnancies is skewed right (γ1 = 1.1096)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 2
Median 3
Q3 6
95-th Percentile 10
Maximum 17
Range 16
IQR 4

Descriptive Statistics

Mean 4.2786
Standard Deviation 3.0215
Variance 9.1296
Sum 3286
Skewness 1.1096
Kurtosis 0.732
Coefficient of Variation 0.7062
  • Pregnancies is not normally distributed (p-value 8.58050314764807e-13)
  • Pregnancies has 14 outliers

Glucose

numerical

Approximate Distinct Count 135
Approximate Unique (%) 17.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12288
Mean 121.6758
Minimum 44
Maximum 199
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Glucose is skewed right (γ1 = 0.5327)

Quantile Statistics

Minimum 44
5-th Percentile 80
Q1 99.75
Median 117
Q3 140.25
95-th Percentile 181
Maximum 199
Range 155
IQR 40.5

Descriptive Statistics

Mean 121.6758
Standard Deviation 30.4363
Variance 926.3654
Sum 93447
Skewness 0.5327
Kurtosis -0.2646
Coefficient of Variation 0.2501

BloodPressure

numerical

Approximate Distinct Count 47
Approximate Unique (%) 6.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12288
Mean 72.25
Minimum 24
Maximum 122
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • BloodPressure is skewed right (γ1 = 0.1738)

Quantile Statistics

Minimum 24
5-th Percentile 52
Q1 64
Median 72
Q3 80
95-th Percentile 90
Maximum 122
Range 98
IQR 16

Descriptive Statistics

Mean 72.25
Standard Deviation 12.1172
Variance 146.8266
Sum 55488
Skewness 0.1738
Kurtosis 1.063
Coefficient of Variation 0.1677
  • BloodPressure is not normally distributed (p-value 0.000622885810687529)
  • BloodPressure has 14 outliers

SkinThickness

numerical

Approximate Distinct Count 50
Approximate Unique (%) 6.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12288
Mean 26.4479
Minimum 7
Maximum 99
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • SkinThickness is skewed right (γ1 = 1.2141)

Quantile Statistics

Minimum 7
5-th Percentile 14.35
Q1 20
Median 23
Q3 32
95-th Percentile 44
Maximum 99
Range 92
IQR 12

Descriptive Statistics

Mean 26.4479
Standard Deviation 9.7339
Variance 94.7483
Sum 20312
Skewness 1.2141
Kurtosis 3.6819
Coefficient of Variation 0.368
  • SkinThickness is not normally distributed (p-value 1.342593504892045e-21)
  • SkinThickness has 9 outliers

Insulin

numerical

Approximate Distinct Count 185
Approximate Unique (%) 24.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12288
Mean 118.2708
Minimum 14
Maximum 846
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Insulin is skewed right (γ1 = 3.2785)

Quantile Statistics

Minimum 14
5-th Percentile 50
Q1 79
Median 79
Q3 127.25
95-th Percentile 293
Maximum 846
Range 832
IQR 48.25

Descriptive Statistics

Mean 118.2708
Standard Deviation 93.2438
Variance 8694.4116
Sum 90832
Skewness 3.2785
Kurtosis 13.9787
Coefficient of Variation 0.7884
  • Insulin is not normally distributed (p-value 2.984745863508377e-24)
  • Insulin has 89 outliers

BMI

numerical

Approximate Distinct Count 248
Approximate Unique (%) 32.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12288
Mean 32.4508
Minimum 18.2
Maximum 67.1
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • BMI is skewed right (γ1 = 0.5999)

Quantile Statistics

Minimum 18.2
5-th Percentile 22.235
Q1 27.5
Median 32
Q3 36.6
95-th Percentile 44.395
Maximum 67.1
Range 48.9
IQR 9.1

Descriptive Statistics

Mean 32.4508
Standard Deviation 6.8754
Variance 47.2708
Sum 24922.19
Skewness 0.5999
Kurtosis 0.9075
Coefficient of Variation 0.2119
  • BMI has 8 outliers

DiabetesPedigreeFunction

numerical

Approximate Distinct Count 517
Approximate Unique (%) 67.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12288
Mean 0.4719
Minimum 0.078
Maximum 2.42
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • DiabetesPedigreeFunction is skewed right (γ1 = 1.9162)

Quantile Statistics

Minimum 0.078
5-th Percentile 0.1404
Q1 0.2437
Median 0.3725
Q3 0.6262
95-th Percentile 1.1328
Maximum 2.42
Range 2.342
IQR 0.3825

Descriptive Statistics

Mean 0.4719
Standard Deviation 0.3313
Variance 0.1098
Sum 362.401
Skewness 1.9162
Kurtosis 5.5508
Coefficient of Variation 0.7022
  • DiabetesPedigreeFunction is not normally distributed (p-value 1.2986963614423168e-06)
  • DiabetesPedigreeFunction has 29 outliers

Age

numerical

Approximate Distinct Count 52
Approximate Unique (%) 6.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12288
Mean 33.2409
Minimum 21
Maximum 81
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Age is skewed right (γ1 = 1.1274)

Quantile Statistics

Minimum 21
5-th Percentile 21
Q1 24
Median 29
Q3 41
95-th Percentile 58
Maximum 81
Range 60
IQR 17

Descriptive Statistics

Mean 33.2409
Standard Deviation 11.7602
Variance 138.303
Sum 25529
Skewness 1.1274
Kurtosis 0.6312
Coefficient of Variation 0.3538
  • Age is not normally distributed (p-value 1.3070111660854395e-14)
  • Age has 9 outliers

Outcome

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory Size 57564
  • The largest value (No Diabetes) is over 1.87 times larger than the second largest value (Diabetes)

Length

Mean 9.9531
Standard Deviation 1.4309
Median 11
Minimum 8
Maximum 11

Sample

1st row Diabetes
2nd row No Diabetes
3rd row Diabetes
4th row No Diabetes
5th row Diabetes

Letter

Count 7144
Lowercase Letter 5876
Space Separator 500
Uppercase Letter 1268
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (No Diabetes, Diabetes) take over 50.0%
  • The largest value (diabetes) is over 1.54 times larger than the second largest value (no)

BMI_Category

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.5%
Missing 0
Missing (%) 0.0%
Memory Size 54755
  • The largest value (Obese) is over 2.81 times larger than the second largest value (Overweight)

Length

Mean 6.2956
Standard Deviation 2.0761
Median 5
Minimum 5
Maximum 11

Sample

1st row Obese
2nd row Overweight
3rd row Normal
4th row Overweight
5th row Obese

Letter

Count 4835
Lowercase Letter 4067
Space Separator 0
Uppercase Letter 768
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Obese, Overweight) take over 50.0%
  • The largest value (obese) is over 2.81 times larger than the second largest value (overweight)

Glucose_to_BMI

numerical

Approximate Distinct Count 733
Approximate Unique (%) 95.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12288
Mean 3.8694
Minimum 1.479
Maximum 8.2553
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Glucose_to_BMI is skewed right (γ1 = 0.6527)

Quantile Statistics

Minimum 1.479
5-th Percentile 2.2793
Q1 3.0536
Median 3.75
Q3 4.5618
95-th Percentile 5.7633
Maximum 8.2553
Range 6.7763
IQR 1.5082

Descriptive Statistics

Mean 3.8694
Standard Deviation 1.1027
Variance 1.216
Sum 2971.7112
Skewness 0.6527
Kurtosis 0.5903
Coefficient of Variation 0.285
  • Glucose_to_BMI is not normally distributed (p-value 0.00017293677680752122)
  • Glucose_to_BMI has 10 outliers

Glucose_Level

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.4%
Missing 0
Missing (%) 0.0%
Memory Size 7204
  • The largest value (Medium) is over 1.76 times larger than the second largest value (Low)

Length

Mean 4.6836
Standard Deviation 1.3109
Median 4
Minimum 3
Maximum 6

Sample

1st row High
2nd row Low
3rd row High
4th row Low
5th row Medium

Letter

Count 3597
Lowercase Letter 2829
Space Separator 0
Uppercase Letter 768
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Medium, Low) take over 50.0%
  • The largest value (medium) is over 1.76 times larger than the second largest value (low)

Age_Category

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.4%
Missing 0
Missing (%) 0.0%
Memory Size 7210
  • The largest value (Young) is over 1.54 times larger than the second largest value (Middle-Aged)

Length

Mean 6.8984
Standard Deviation 3.08
Median 5
Minimum 3
Maximum 11

Sample

1st row Middle-Aged
2nd row Middle-Aged
3rd row Middle-Aged
4th row Young
5th row Middle-Aged

Letter

Count 5028
Lowercase Letter 3990
Space Separator 0
Uppercase Letter 1038
Dash Punctuation 270
Decimal Number 0
  • The top 2 categories (Young, Middle-Aged) take over 50.0%
  • The largest value (young) is over 1.54 times larger than the second largest value (middleaged)

Interactions

Correlations

Missing Values